Speech retrieval with video parsing for television news programs

نویسندگان

  • Helen M. Meng
  • Xiaoou Tang
  • Pui-Yu Hui
  • Xinbo Gao
  • Yuk-Chi Li
چکیده

We have been working on speech retrieval from Chinese (Cantonese) television news programs. The use of automatic speech recognition for audio indexing produces imperfect transcriptions, and recognition errors affect retrieval performance. A news story typically contains a brief report by the anchor person(s) in the studio, as well as news footage from the field. Investigation shows that our recognizer performs better when indexing audio from the studio, compared to that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our study is based on 146 news stories collected from local television Cantonese news programs. We formulated a known-item retrieval task and adopted the average inverse rank (AIR) as our evaluation metric. Retrieval is performed based on syllable bigram units, augmented with skipped syllable bigrams. Retrieval using the entire audio track of each news story gave AIR=0.759. With the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.768.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Story Segmentation for Spoken Document Retrieval

We have been working on speech retrieval based on Cantonese television news programs. Our video archive contains over 20 hours of news programs provided by a local television station. These programs have been hand-segmented into video clips, where each clip is a self-contained news story. The audio tracks in our archive are indexed by Cantonese speech recognition. This is integrated with a vect...

متن کامل

AT_TV: Broadcast Television and Radio Retrieval

This paper reports recent work at AT&T Laboratories Cambridge to develop retrieval systems for broadcast television and radio programmes. Unlike some other systems, it does not rely on manual classification or annotation of the broadcast material; it is indexed automatically from the air. While many digital video library projects focus solely on broadcast news, we have broadened our efforts to ...

متن کامل

Parsing video programs into individual segments using FSA modeling

Parsing video programs into program segments is useful in retrieval of individual segments and video summarization. Many video classes show structure in them that can be effectively model using Finite-State Automata (FSA). Each of the video segment such as newcaster sequence, weather sequence etc. becomes a node in FSA. The transition is fired from one node to another node based on arc conditio...

متن کامل

P1: Negative Television and Memory

According to reports about 30-thousand people spent watching television had the impact on their memory and recall that the results showed no differences between men and women. The people who watched less than an hour a day did better at every memory function. As these contributors watched negative political ads, physiological responses indicated that their body was reflexively preparing to move...

متن کامل

Audio-visual segmentation for content-based retrieval

This paper reports recent work at ORL on segmentation of digital audio/video recordings. Firstly, we describe an audio segmentation algorithm that partitions a soundtrack into manageably sized segments for speech recognition. Secondly, we present an algorithm for detecting camera shot-break locations in the video. The output of these two algorithms is combined to produce a semantically meaningf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001